#From CalEnviroScreen 4.0, the highest concentrations of PM2.5 are observed in the cities of Oakland and Napa. These cities have PM2.5 concentrations of over 16 µg/m3 from 2015-2017.

#Note: PM2.5 annual mean monitoring data were extracted for all monitoring sites in California from CARB’s air monitoring network database for the years 2015-2017, with the exception of the special purpose monitor at San Ysidro wheredata were available only for 2015 and part of 2016.

#From CalEnviroScreen 4.0, the highest ED visits for asthma are observed in the cities of Vallejo and San Leandro. These cities recorded over 200 visits per 10,000 capita from 2015-2017.

#Note: Records for ED visits occurring during 2015-2017 were obtained from OSHPD’s Emergency Department and Ambulatory Surgery files for patients listed as residing in California and principle diagnostic of asthma.

#The best fit on the above scatter plot does not appear to be representative of the CalEnviroScreen data sets. This is because there are many observations of high residuals in the data above the best fit line,particularly for frequency of asthma visits above 100 per 10,000 capita. The wide disparity in these residuals are in contrast with volume of lower residuals such as for frequencies of asthma visits below 50 per 10,000 capita.

#An increase of 1 µg/m3 concentration in PM2.5 is associated with an increase of over 19 visits in asthma. 9.55% of the variation in PM2.5 concentration is explained by the variation in asthma visits.

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_map)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

#The mean of the residuals is close to -25 and the median of the residual is -9.61, which reflects a significant skew to the density curve for the residuals. This suggests that a straight fit best-fit line may not be appropriate for the underlying data as the median should be close to 0.

#The median of the residual is close to 0 which suggests that a log plot of asthma against PM2.5 may be more appropriate for the underlying data. In addition, the best fit line now appears to be better represent the CalEnviroScreen data distribution of asthma against PM2.5.

#An increase of 1 µg/m3 concentration in PM2.5 is associated with an increase of over 1.4 (e0.35633) visits in asthma. 10% of the variation in PM2.5 concentration is explained by the variation in log asthma visits.

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_map)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

#The mean of the residual is close to 0 which suggests that ploting log(asthma) against PM2.5 is more representative of a normal distribution in the data set.

## [1] -2.00402
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.1737 ymin: 37.41911 xmax: -122.1492 ymax: 37.44193
## Geodetic CRS:  NAD83
## # A tibble: 1 × 2
##   `Census Tract`                                                        geometry
##            <dbl>                                              <MULTIPOLYGON [°]>
## 1     6085513000 (((-122.1737 37.42636, -122.1735 37.42706, -122.1734 37.4274, …
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.1737 ymin: 37.41911 xmax: -122.1492 ymax: 37.44193
## Geodetic CRS:  NAD83
## # A tibble: 1 × 2
##   `Approximate Location`                                                geometry
##   <chr>                                                       <MULTIPOLYGON [°]>
## 1 Stanford               (((-122.1737 37.42636, -122.1735 37.42706, -122.1734 3…
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.1737 ymin: 37.41911 xmax: -122.1492 ymax: 37.44193
## Geodetic CRS:  NAD83
## # A tibble: 1 × 2
##   Longitude                                                             geometry
##       <dbl>                                                   <MULTIPOLYGON [°]>
## 1     -122. (((-122.1737 37.42636, -122.1735 37.42706, -122.1734 37.4274, -122.…
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.1737 ymin: 37.41911 xmax: -122.1492 ymax: 37.44193
## Geodetic CRS:  NAD83
## # A tibble: 1 × 2
##   Latitude                                                              geometry
##      <dbl>                                                    <MULTIPOLYGON [°]>
## 1     37.4 (((-122.1737 37.42636, -122.1735 37.42706, -122.1734 37.4274, -122.1…

#The census tract with the lowest “residual” is in Stanford University. In the context of Asthma estimation, a low residual indicates under estimation of PM2.5 against Asthma incidences. The likely reason for this under representation is because there are fewer ED visits registered in Stanford, and the age-adjusted rates accounted for the student population in that particular location, coupled with the constant inflow/outflow of students residing in the area.